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Abstract 

We derive the asymptotic distribution of the domination number of a new family of random digraph 
called proximity catch digraph (PCD), which has application to statistical testing of spatial point patterns 
and to pattern recognition. The PCD we use is a parametrized digraph based on two sets of points on 
the plane, where sample size and locations of the elements of one is held fixed, while the sample size of 
the other whose elements are randomly distributed over a region of interest goes to infinity. PCDs are 
constructed based on the relative allocation of the random set of points with respect to the Delaunay 
triangulation of the other set whose size and locations are fixed. We introduce various auxiliary tools 
and concepts for the derivation of the asymptotic distribution. We investigate these concepts in one 
Delaunay triangle on the plane, and then extend them to the multiple triangle case. The methods are 
illustrated for planar data, but are applicable in higher dimensions also. 
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1 Introduction 



The proximity c atch digraphs (PCDs) are a special type of proximity graphs which were introduced by 
Toussaint ( 19801 ). A digraph is a directed graph with vertices V and arcs (directed edges) each of which is 



from one vertex to another based on a binary relation. Then the pair (p, q) € V xV is an ordered pair which 
stands for an arc ( direct ed edge) from vertex p to vertex q. For example, the nearest neighbor (di)graph of 
IPaterson and Yaol ( 19921 ) is a proximity digraph. The nearest neighbor digraph has the vertex set V and 
(p, q) as an arc iff q is a nearest neighbor of p. 

Our PCDs are based on the proximity maps which are defined in a fairly general setting. Let (Q,M) be 
a measurable space. The proximity map N(-) is defined as N : Q — > 2°, where 2 n is the power set of J7. 
The proximity region of x G f£, denoted N(x), is the image of x G fl under N(-). The points in N(x) are 
thought of as being "closer" to x € CI than are the points in \ N(x). Hence the term "proxim ity" in the 
name proximity catch digraph. Proximity maps are the building blocks of the proximity graphs of iToussaind 
(Il980h : an extensive survey on proximity maps and graphs is available in Ijaromczvk and Toussaintl (|l992l ). 

The proximity catch digraph D has the vertex set V = {pi , . . . , p n \ ; and the arc set A is defined by 
(jPiiPj) <= -4 iff pj £ N{pi) for i 7^ j. Notice that the proximity catch digraph D depends on the proximity 
map N(-) and if pj G N(pi), then we call Nfa) (and hence point pi) catches pj. Hence the term "catch" in 
the name proximity catch digraph. If arcs of the form (j>j , P j) (i.e., loops) were allowed, D would have been 
called a pseudodigraph according to some authors (see, e.g., Chartrand and Lesniak ( 19961 )). 

In a digraph D = (V, A), a vertex v G V dominates itself and all vertices of the form {u : (v, u) G A}. A 
dominating set Sd for the digraph D is a subset of V such that each vertex we Vis dominated by a vertex in 
Sd- A minimum dominating set S D is a d ominating set of minimum cardinality and the domination number 



j(D) is defined as 'y(D) := \S n\ (s ee, e.g . . iLeej ([19981 )) where | • | denotes the set cardinality functional. See 



Chartrand and Lesniakl (|l996T ) and lWestl (|200lh for more on graphs and digraphs. If a minimum dominating 
set is of size one, we call it a dominating point. 

Note that for |V| = n > 0, 1 < 'f(D) < n, since V itself is always a dominating set. 

In recent years, a new classification tool based on the relative allocation of points from various classes 
has been developed. iPriebe et al. I (l200ll) introduced the class cover catch digraphs (CCCDs) and gave the 
exact and the asymptotic distribution of the domination number of the CCCD based on two sets, X n and 
y m , which are of size n and m, from classes, X and y, respectively, and are set s of iid random v a riable s 



from uniform distribut i on on a compact int e rval in K. iDeVinnev and Priebel (|2006h . lDeVinnev et all (|2002h 



Marchette and Priebd (|2003l ). IPriebe et all (|2003al lbl) applied the concept in higher dimensions and demon- 
strated relatively good performance of CCCD in classification. The methods employed involve data reduction 
(condensing) by using approximate minimum dominating sets as prototype sets (since finding the exact min- 
imum dominatin g set is an NP-hard problem in g eneral — e.g., for CCCD in multiple dimensions — (see 
IPeVinnevI ( 2003f l). IDeVinnev and Wiermanl ( 20031 ) proved a SLLN result for the domination number of CC- 
CDs for one-dimensional data. Although intuitively appealing and easy to extend to higher dimensions, exact 
and asymptotic distribution of the domination number of the CCCDs are not analytical ly tractable in R 2 or 
higher dime nsions. As alternatives to CCCD, two new families of P CDs are introduced in Cevhan and Priebel 
(120031 . 120051) and are applied in testing spatial point patterns (see, ICevhan et al. l (|2005i 120061 )). These new 
families are both applicable to pattern classification also. They are designed to have better distributional 
and mathem atical properti e s. Fo r example, the distribution of the relative dens ity (of arcs) is derived for 
one family in lCevhan et al.l (|2005l ) and for the other family in ICevhan et aD(|2006h .' In this article, we derive 
the asymptotic distribution of the domination number of the latter family called r-factor proportional-edge 
PCD. During the derivation process, we introduce auxiliary tools, such as, proximity region (which is the 
most crucial concept in defining the PCD), T\-region, superset region, closest edge extrema, asymptotically 
accurate distribution, and so on. We utilize these special regions, extrema, and asymptotic expansion of 
the distribution of these extrema. The choice of the change of variables in the asymptotic expansion is also 
dependent on the type of the extrema used and crucial in finding the limits of the improper integrals we 
encounter. Our methodology is instructive in finding the distribution of the domination number of similar 
PCDs in M 2 or higher dimensions. 

In addition to the mathematical tractability and applicability to testing spatial patterns and classification, 
this new family of PCDs is more flexible as it allows choosing an optimal parameter for best performance in 
hypothesis testing or pattern classification. 
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The domination number of PCDs is first investigated for data in one Delaunay triangle (in K 2 ) and the 
analysis is generalized to data in multiple Delaunay triangles. Some trivial proofs are omitted, shorter proofs 
are given in the main body of the article; while longer proofs are deferred to the Appendix. 

2 Proximity Maps and the Associated PCDs 

We construct the proximity regions using two data sets X n and y m from two classes X and y, respectively. 
Given y m C S7, the proximity map Ny(-) : fl — > 2° associates a proximity region Ny(x) C fl with each 
point x G Q. The region Ny(x) is defined in terms of the distance between x and y m . More specifically, 
our r-factor proximity maps will be based on the relative position of points from X n with respect to the 
Delaunay tessellation of y m . In this article, a triangle refers to the closed region bounded by its edges. See 
Figure [1] for an example with n = 200 X points iid UUO, 1) x (0, 1)), the uniform distribution on the unit 
square and the Delaunay triangulation is based on m = 10 y which arc points also iid i/((0, 1) x (0, 1)). 




Figure 1: A realization of 200 X points (crosses) and the Delaunay triangulation based on 10 y points 
(circles). 

If X n — {Xi, . . . , Xn] is a set of f2-valued random variables then Ny(Xi) are random sets. If Xi are iid 
then so are the random sets Ny(Xi). We define the data-random proximity catch digraph D — associated 
with Ny(-) — with vertex set X n = {X±, • • • , X n } and arc set A by 

{Xi,Xj)eA Xj G Ny(Xi). 

Since this relationship is not symmetric, a digraph is used rather than a graph. The random digraph D 
depends on the (joint) distribution of Xi and on the map Ny(-). For X n = {X 1: ■ ■ ■ , X n \, a set of iid random 
variables from F, the domination number of the associated data-random proximity catch digraph based on 
the proximity map N(-), denoted "f(X ni N), is the minimum number of point(s) that dominate all points in 
X n . 

The random variable ^{X n , N) depends explicitly on X n and N(-) and implicitly on F . Furthermore, in 
general, the distribution, hence the expectation E [y(X n , N)], depends on n, F, and N; 1 < E [7 N)] < n. 
In general, the variance of ^(X n ,N) satisfies. 1 < V ar [y(X n , N)) < n 2 /4. 

For example, the CCCD of IPriebe et al. ( 200 if ) can be viewed as an example of PCDs and is briefly 



discussed in the next section. We use many of the properties of CCCD in R as guidelines in defining PCDs 
in higher dimensions. 



2.1 Spherical Proximity Maps 



Let y m = {yi, . . . ,y m } C R. Then the proximity map associated with CCCD is defi n ed as the open ball 
Ns(x) := B(x,r(x)) for all x G K, where r(x) := min ye ;y m d(x, y) (see Priebe et al. ( 200lh ) with d(x,y) 
being the Euclidean distance between x and y. That is, there is an arc from Xi to Xj iff there exists an 
open ball centered at Xi which is "pure" (or contains no elements) of y m in its interior, and simultaneously 
contains (or "catches") point Xj. We consider the closed ball, B(x,r(x)) for Ns(x) in this article. Then 
for x G y m , we have N$(x) — {x}. Notice that a ball is a sphere in higher dimensions, hence the notation 
Ns- Furthermore, dependence on y m is through r(x). Note that in R this proximity map is based on the 
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j order statistic in y m . This interval partitioning can be viewed as the Dclaunay tessellation of R based 
on y m . So in higher dimensions, we use the Delaunay triangulation based on y m to partition the support. 

A natural extension of the proximity region N s (x) to R d with d > 1 is obtained as N s (x) := B(x, r(x)) 
where r(x) := rmn ye y m d(x,y) which is called the spherical proximity map. The spherical proximity map 
Ns{x) is well-defined for all x G R d provided that y m ^- 0. Extensions to R 2 and high er dimensions with the 
spheri c al proximity map — w it h applications in class i ficatio n — are investigated by DeVinnev and Priebei 
(|2006l ). lDeVinnev et all (|2002f ). iMarchette and Priebei (j2003h . IPriebe et all (|2003al la). However, finding the 
minimum dominating set of CCCD (i.e., the PCD associated with Ns(-)) is an NP-hard problem and the 
distribution of the domination number is not analytically tractable fo r d > 1. This drawback has motivated 
us to define new types of proximity maps. Ceyhan and Priebe ( 2005j ) introduced r-factor proportional-edge 
PCD, where the distribution of the domi nation number of r-fa ctor PCD with r = 3/2 is used in testing spatial 
patterns of segregation or association. Cevhan et al. ( 20061 ) comput ed the asymptotic distrib ution of the 
relative density of the r-factor PCD and used it for the same purpose. Cevhan and Priebe] ( 2003 ) introduced 
the central similarity proximity maps and the associated PCDs, and Cevhan et alj ( 2005t ) computed the 
asymptotic distribution of the relative density of the parametrized version of the central similarity PCDs 
and applied the method to testing spatial patterns. An extensive treatment of the PCDs based on Delaunay 
tessellations is available in ICevhad (Eoolh . 



The following property (which is referred to as Property (TTJ)) of CCCDs in 
in defining proximity maps in higher dimensions. 



plays an important role 



Property (1) For x G Ij, Ns(x) is a proper subset of Ij for almost all x E Ij. 



(1) 



In fact, Property © holds for all x G I 3 \ {(y (j _i) :m +y J:m )/2} for CCCDs in R. For x € Ij, N s (x) = I 3 iff 
x = (y(j-i) :m + yj:m) /2. We define an associated region for such points in the general context. The superset 
region for any proximity map N(-) in f2 is defined to be 

&s(N) := {x G fl : N(x) = 0}. 

For example, for Q = I } ■ C R, M S {N S ) := {x G : N s (x) = Ij} = { (y(j_i) :m + y j:m ) /2} and for 
O = Tj C R d , &s(Ns) ■= {x G Tj : N s {x) = Tj}, where T is the j th Delaunay cell in the Delaunay 
tessellation. Note that for x G Ij, \(N s (x)) < X(Ij) and \(N s (x)) = X(Ij) iff x G M S {N S ) where A(-) is the 
Lcbesguc measure on R. So the proximity region of a point in £%s{Ns) has the largest Lebesgue measure. 
Note also that given y m , &s(Ns) is not a random set, but I(X G ^%s{Ns)) is a random variable, where I(-) 
stands for the indicator function. Property (TTJ) also implies that &s(Ns) has zero R-Lebesgue measure. 

Furthermore, given a set B of size n in [yi :m ,y m:m ] \3-mj the number of disconnected components in the 
PCD based on Ng(-) is at least the cardinality of the set {j G {1, 2, . . . , m} : B n J,- ^ 0}, which is the set 
of indices of the intervals that contain some point (s) from B. 

Since the distribution of the domination number of spherical PCD (or CCCD) is tractable in R, but not 
in R d with d > 1, we try to mimic its properties in R while defining new PCDs in higher dimensions. 



3 The r-Factor Proportional-Edge Proximity Maps 



First, we describe the construction of the r-factor proximity maps and regions, then state some of its basic 
properties and introduce some auxiliary tools. 
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3.1 Construction of the Proximity Map 



Let y m = {yi , . . • , y m } be m points in general position in R rf and Tj be the j th Delaunay cell for j = 1, . . . , J m , 
where J m is the number of Delaunay cells. Let X n be a set of iid random variables from distribution F in 
R d with support S{F) CC H (y m )- 

In particular, for illustrative purposes, we focus on R 2 where a Delaunay tessellation is a triangulation, 
provided that no more than three points in y m are cocircular (i.e., lie in the same circle). Furthermore, 
for simplicity, let 3^3 = {yi,Y2,y3} be three non-collinear points in R 2 and T(y 3 ) = T 1 (yi I y2,y3) be the 
triangle with vertices 3^3 • Let X n be a set of iid random variables from F with support S(F) C T(y 3 ). If 
F = lA{T{y 3 )), a composition of translation, rotation, reflections, and scaling will take any given triangle 
T(y 3 ) to the basic triangle T b = T((0,0), (1,0), (c u c 2 )) with < a < 1/2, c 2 > 0, and (1 - ci) 2 + c 2 < 1, 
preserving uniformity. That is, if X ~ U(T(y 3 )) is transformed in the same manner to, say X' , then we 
have X' ~U{T b ). 

For r £ [1, oo], define N PE (-, M) := N(-, M; r, y 3 ) to be the r-factor proportional-edge proximity map 
with M- vertex regions as follows (see also Figure [5] with M = Mc and r = 2). For x <G T(y 3 ) \ y$, let 
v(x) € 3^3 be the vertex whose region contains x; i.e., x £ Rm(u(x)). In this article M-vertex regions are 
constructed by the lines joining any point M £ R 2 \ 3^3 to a point on each of the edges of T(3%). Preferably, 
M is selected to be in the interior of the triangle T(3^3)°. For such an M, the corresponding vertex regions 
can be defined using the line segment joining M to ej, which lies on the line joining yj to M; e.g. see Figure 
[3] (left) for vertex regions based on center of mass Mc, and (right) incenter Mi. With Mc, the lines joining 
M and 3^3 are the median lines, that cross edges at Mj for j = 1,2,3. M-vertex reg ions, among m any 



possibilities, can also be defined by the orthogonal projections from M to the edges. See lCevhanl (|2004f ) for 
a more general definition. The vertex regions in Figure [2] are center of mass vertex regions or CM-vertex 
regions. If x falls on the boundary of two M-vertex regions, we assign v(x) arbitrarily Let e(x) be the edge 
of T(3^3) opposite of v{x). Let £(v(x), x) be the line parallel to e(x) through x. Let d(v(x), £(v(x), x)) be the 
Euclidean (perpendicular) distance from v(x) to £(v(x),x). For r E [1, oo), let £ r (v(x),x) be the line parallel 
to e(x) such that 

d(v{x),£ r (v{x),x)) = r d(v(x), £(v(x), x)) and d(£(v(x),x),£ r (v(x),x)) < d{v(x), £ r (v{x), x)). 

Let T r (x) be the triangle similar to and with the same orientation as T{y^) having v{x) as a vertex and 
£ r (v(x), x) as the opposite edge. Then the r-factor proportional-edge proximity region Np E (x, M) is defined 
to be T r (x) nT(3^3). Notice that £(v(x), x) divides the edges of T r {x) (other than the one lies on £ r (v(x),x)) 
proportionally with the factor r. Hence the name r-factor proportional edge proximity region. 




Figure 2: Construction of r-factor proximity region, N PE 2 (x) (shaded region). 

Notice that r > 1 implies x € N PE (x, M) for all x € T(y 3 ). Furthermore, lim^oo N PE (x, M) = T(y 3 ) 
for all x G T(y 3 )\y 3 , so we define Np° E (x, M) = T(y 3 ) for all such x. For x € 3%, we define N r PE {x, M) = {x} 
for all r £ [1, oo]. 



Figure 3: The vertex regions constructed with center of mass Mc (left) and incenter Mj (right) using the 
line segments on the line joining M to 3^3 ■ 




Hence, r-factor proportional edge PCD has vertices X n and arcs (xi, Xj) iff Xj € N PE (xi, M). See Figure 
U for a realization of X n with n — 7 and m = 3. The number of arcs is 12 and 7 n (^ = 2, Mc) = 1. 
By construction, note that as x gets closer to M (or cquivalcntly further away from the vertices in vertex 
regions), N PE (x 1 M) increases in area, hence it is more likely for the outdegree of x to increase. So if more 
X points are around the center M, then it is more likely for 7„ to decrease, on the other hand, if more 
X points are around the vertices 3^3, then the regions get smaller, hence it is more likely for the outdegree 
for such points to be sma ller, thereby implying j n to increase. This probabilistic behaviour is utilized in 
Cevhan and Priebd (|2005h for testing spatial patterns. 



Note also that, N PE (x, M) is a homothetic transformation (enlargement) with r > 1 applied on the 
region Npg-(x, M). Furthermore, this transformation is also an affine similarity transformation. 

3.2 Some Basic Properties and Auxiliary Concepts 

First, notice that Xi *~ F, with the additional assumption that the non-degenerate two-dimensional proba- 
bility density function / exists with support S(F) C T(3^j), imply that the special case in the construction 
of N PE — X falls on the boundary of two vertex regions — occurs with probability zero. Note that for such 
an F, N PE (X) is a triangle a.s. 

mini d(v(x), e(x)j ,r d(^v(x) , £(v(x) ,x)j J 

The similarity ratio of N r PE (x, M) to T(y 3 ) is given by ^ d(v(x),e(x)) > tlmt is ' N pe( x ' m ) 

is similar to 2^(3^3) with the above ratio. Property (pQ) holds depending on the pair M and r. That is, 
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Figure 5: The triangle S? r with r = y/2 (the hatched region). 



there exists an tq and a corresponding point M(tq) e T(y 3 )° so that N p ° E (x, M) satisfies Property ([T]) for 
all r < ro, but fails to satisfy it otherwise. Property ([1]) fails for all M when r = oo. With CM-vertex 
regions, for all r G [1, oo], the area ^4 (N PE (x, Mc)) is a continuous function of d(£ r (v (x), x), v(x)) which is 
a continuous function of d(£(v(x), x), v(x)) which is a continuous function of x. 

Note that if x is close enough to M, we might have N PE (x, M) = T(y 3 ) for r = \/2 also. 

In T{y^) 1 drawing the lines qj(r, x) such that d(yj, ej) = r d(yj, qj(r, x)) for j € {1,2,3} yields a triangle, 
denoted %, for r < 3/2 . See Figure [5] for 5£ with r = \/2. 

The functional form of in the basic triangle XJ, is given by 

^. m , , * / s , f, . m C2 (r — 1) C2 (1 — rx) C2 (r (x — 1) + 1) ] 

5; = T(ti(r),t 3 (r),t s (r)) = ^ £ T 6 : j, > ij j, < -Zj- ^; v < -^-i ^ i \ 2 

[ r r (1 — cij rci J 

(r-l)(l + ci) c 2 (r-l)\ /2-r + ci(r-l) c 2 (r-l)\ /ci (2 - r) + r - 1 c 2 (r-2) 



= T 

There is a crucial difference between the triangles £? r and T(M\, M2, M3). More specifically T(M±, M2, M3) C 
^ s (r, M) for all M and r > 2, but (^)° and ^ s (r, M) are disjoint for all M and r. So if M e then 
M s {r,M) = 0; if M e d(^), then 3% s {r,M) = {M}; and if M ^ 5^, then @ s (r,M) has positive area. 
Thus Np E (-, M) fails to satisfy Property ([1]) if M ^ See Figurc[5]for two examples of superset regions 
with M that corresponds to circumcenter Mcc m this triangle and the vertex regions are constructed using 
orthogonal projections. For r = 2, note that ,% = and the superset region is T(Mi,M 2 ,M 3 ) (see Figure 
1] (left)), while for r = V2, ^° and ^ s (r = \/2, M)° are disjoint (see Figure© (right)) 

The triangle 2? r given in Equation J2]) and the superset region ( r i M) play a crucial role in computing 
the distribution of the domination number of the ?*-factor PCD. 



3.3 Main Result 

Next, we present the main result of this article. Let 7 ra (r, M) := 7 (X„ , N PE , M) be the domination number 
of the PCD based on N PE with X n , a set of iid random variables from U(T(y 3 )) 1 with M-vertex regions. 
The domination number j n (r, M) of the PCD has the following asymptotic distribution. As n — > 00, 

r 2 + BER(l-p r ), for r e [1,3/2] and M e {t x (r), t a (r), t 3 (r)}, 
7n(r,M)~ ^ 1, for r > 3/2, (3) 

I 3, for re [1,3/2) and Me ,%\ {h(r),t 2 (r),t 3 (r)}, 

where BER(p) stands for Bernoulli distribution with probability of success p, 5? r and tj(r) are defined in 
Equation ©, and for r G [1,3/2) and M € {ti(r),t 2 (r),t s (r)}, 

r°° r°° 64 r 2 / 4r \ 

2V = y y 9(r - l) 2 Wl W3 CXP \ 3(r - 1) ^ + + 2r (r - 1) wi u- 3 )j dw 3 wi; (4) 



Figure 6: The superset regions (the shaded regions) constructed with circumcenter Mcc with r = 
and r = 2 (right) with vertex regions constructed with orthogonal projections to the edges. 



V2 (left) 




0.4 - 



°1 1.1 1 Is 1.3 1.4 1.5 

r 



Figure 7: Plotted is the probability p r = linin^oo P (7 n (r, M) = 2) given in Equation ((4]) as a function of r 
for r G [1,3/2) and M G {t l (r) ) t 2 (r) > t a (r)}. 

for example for r = 3/2 and M = Mc, p r ~ -7413. 

In Equation ([5]) , the first line is referred as the non-degenerate case, the second and third lines are referred 
as degenerate cases with a.s. limits 1 and 3, respectively. 

In the following sections, we define a region associated with 7 = 1 case in general. Then we give finite 
sample and asymptotic upper bounds for 7„(r, M). Then we derive the asymptotic distribution of 7 n (^, M). 

4 The IVRegions for N r PE 

First, we define Ti-regions in general, and describe the construction of Ti-region of Np E for one point and 
multiple point data sets, and provide some results concerning ri-regions. 

4.1 Definition of IVRegions 

Let (Q,JA) be a measurable space and consider the proximity map iV : Q — > 2 n . For any set B C fi, the 

Ti-region of B associated with N(-), is defined to be the region (B) := {z G f2 : B C 7V(z)}. For x G f2, 

we denote ({x}) as rf (x). 

If X„ = {X\,X2, • • • , X n } is a set of fi-valued random variables, then T^(Xi), i = 1, • • • ,n, and r] v (A' n ) 

are random sets. If the X, are iid, then so are the random sets (Xi). 

Note that y(X n ,N) = 1 iff Af„ n rf (Af„) 7^ 0. Hence the name Ti-region. 
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It is trivial to see the following. 
Proposition 4.1. For any proximity map N and set B C f2, Ms(N) C r^(i?). 
Lemma 4.2. For anj/ proximity map N and B C f2 ; rj v (S) = n^gsr^x). 

Proof: Given a particular type of proximity map N and subset i? C f!, y £ r^(i?) iff B C 7V(y) iff 
a; G iV(y) for all a; G B iff y G rf (x) for all x G B iff y G n a;e srf r (.T). Hence the result follows. ■ 

A problem of interest is finding, if possible, a (proper) subset of B, say G C. B, such that (B) = 
Dx^c^i ( x )- This implies that only the points in G will be active in determining T^(B). 

For example, in R with 3^2 = {0, 1}, and X n a set of iid random variables of size n > 1 from F in (0, 1), 

r^ Vs (A'„) = ^JT„ : „/2, (I + Xi :n )/2j. So the extrema (minimum and maximum) of the set X n are sufficient 

to determine the Ti-region; i.e., G = {Xi :n , X n:n } for X n a set of iid random variables from a continuous 
distribution on (0, 1). Unfortunately, in the multi-dimensional case, there is no natural ordering that yields 
natural extrema such as minimum or maximum. 

4.2 Construction of IVRegion of a Point for N PE 

For Np E (-, M), the Ti-region, denoted as V[(-,M) := T^ PE (■, M), is constructed as follows; see also Figure 
[HI Let £j(r,x) be the line parallel to ej such that £j(r, as) n T(y 3 ) ^ and rd(yj,£j(r,x)) = d(y j , i(y j , x)) 
for j G {1,2,3}. Then 

T{(x, M) = Uf =1 [T[(x, M) n iMfc)] 

where r;(x,M) fl Am^') = {z£ i^fo) : <%, z)) > <%,^(r,x)} for j G {1,2,3}. 

Notice that r > 1 implies that x G T\(x,M). Furthermore, linv^oo T\{x, M) = T(y 3 ) for all x G 
7(^3) \ ^3 and so we define r^=°°(x,M) = T(y 3 ) for all such x. For x G ^3, T{(x,M) = {x} for all 
r G [1, oo]. 




Figure 8: Construction of the Ti-region, T\~ 2 (x,Mc) (shaded region). 

Notice that T\(x,Mc) is a convex hexagon for all r > 2 and x G T(3^3) \ 3^3, (since for such an x, 
T\(x,Mc) is bounded by £j(r,x) and e 3 for all j G {1,2,3}, see also Figure [U) else it is either a convex 
hexagon or a non-convex but star-shaped polygon depending on the location of x and the value of r. 

4.3 The IYRegion of a Multiple Point Data Set for N r PE 

So far, we have described the Ti-region for a point in x G T(y 3 ). For a set X n of size n in T(3%), the region 
T\(X n , M) can be specified by the edge extrema only. The (closest) edge extrema of a set B in T(3^) are 
the points closest to the edges of T(y 3 ), denoted x ej for j G {1, 2, 3}; that is, x ej G arginf aeB d(x, ej). Note 
that if B = X n is a set of iid random variables of size n from F then the edge extrema, denoted X ej (n) , arc 
random variables. Below, we show that the edge extrema are the active points in defining T\(X n , M). 



Figure 9: The Ti-regions (the hatched regions) for r = 2 with seven X points iid U(T(y^)) where vertex 
regions constructed with incenter Mj (left) and circumcenter Mqc (right) with orthogonal projection. 



Proposition 4.3. Let B be any set of n distinct points in TQ%). For r-factor proportional-edge proximity 
maps with M -vertex regions, (B, M) = n| =1 T\ (x ek , M). 

Proof: Given B = {x\, . . . , x n } in T(3%). Note that 

T[(B, M) n R M (yj) = [n? =1 ri( Xi ,M)] n R M (yj), 

but by definition x Ej G argmax a . gB d(yj, £j(r, x)), so 

r[(B, M) n R M (y 3 ) = ri(s ej , M) n i? M (y,) for j g {l, 2, 3}. (5) 

Furthermore, Tl(B, M) = U| =1 [rj(x ej , M) n i?M(yj)] , and 

r[(x e , , m) n R M ( yj ) = nLi [ri(x efc ,M) n R M ( yj )] for j g {1, 2, 3}. (6) 

Combining these two results in Equations ([5]) and we obtain T\(B,M) = C\\ =1 T\(x ek ,M). ■ 

From the above proposition, we see that the Ti-region for B as in proposition can also be written as the 
union of three regions of the form 

T r 1 (B,M)nR M (y j ) = {ze i? M (y J ) : z)) > d( yj ,^(r,x ej ))} for j G {1,2,3}. 

See Figure[5]for Ti-region for r = 2 with seven X points iid U(T(y^)). In the left figure, vertex regions 
are based on incenter, while in the right figure, on circumcenter with orthogonal projections to the edges. 
In either case X n n T\ =2 (X n , M) is nonempty, hence j n (2,M) = 1. 

Below, we demonstrate that edge extrema are distinct with probability 1 as n — ► oo. Hence in the limit 
three distinct points suffice to determine the Ti-region. 

Theorem 4.4. Let X n be a set of iid random variables from lOTiy^)) and let E c ^(n) be the event that 
(closest) edge extrema are distinct. Then P(E c ^(n)) — > 1 as n — > oo. 

We can also define the regions associated w ith 7(AV, ,, iV) = k for k < n called IT \-region for proximity 
map Ny 3 (•) and set B C fi for k = 1, . . . , n (see ICevhanl (|2004h ). 



5 The Asymptotic Distribution of 7 n (r, M) 

In this section, we first present a finite sample upper bound for j n (r,M), then present the degenerate cases, 
and the nondegenerate case of the asymptotic distribution of 7„(r, M) given in Equation 
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5.1 An Upper Bound for j n (r, M) 



Recall that by definition, r y{X n ,N) < n. We will seek an a.s. least upper bound for j(X n ,N). Let X n be 
a set of iid random variables from F on T(y 3 ) and let j(X n ,N) be the domination number for the PCD 
based on a proximity map N. Denote the general a.s. le ast uppe r boun d for j(X n ,N) that works for all 
n > 1 and is independent of n (which is called n-value in lCevhanl ( 2004h ) as k(N) := min{/c : 7(A"„, N) < 
k a.s. for all n > 1}. 

In M with 3^2 = {0, 1}, for X n a set of iid random variables from U(0, 1), 7(Af„, Ns) < 2 with equality 
holding with positive probability. Hence k(Ns) = 2. 

Theorem 5.1. Let X„ be a set of iid random variables from li (T (y^)) and M G M 2 \J%. Then k (N pe ) = 3 
forN PE {-,M). 

Proof: For N PE (-, M), pick the point closest to edge ej in vertex region i?M(yj); that is, pick Uj G 
argmin Xe ^ nflM(yi) d(X, ej) = axgnmK XeXnnRM ^ d(£(y, X), y 3 -) in the vertex region for which X n nR M (yj) ^ 
for j G {1,2,3} (note that as n — » oo, C/j is unique a.s. for each j, since X is from W(T(3^))). Then 
X n n RM(Vj) C N PE (Uj,M). Hence <Y„ C Uf =1 iV£ B (£/,-, M). So 7„(r,Mc) < 3 with equality holding with 
positive probability. Thus k (N pe ) = 3. ■ 

Below is a general result for the limiting distribution of "f(X n , N) for X n from a very broad family of 
distributions and for general N(-). 

Lemma 5.2. Let Ms{N) be the superset region for the ■proximity map N(-) and X n be a set of iid random 
variables from F with P F (X G ^s(N)) > 0. Then Iim^a, P F (^(X n ,N) = 1) = 1. 

Proof: Suppose P F (X G M S {N)) > 0. Recall that for any x G &s(N), we have iV(x) = ft, so 
Af n C N(x), hence if Af n n ^s(iV) ^ then 7(A? n , TV) = 1. Then P(X n n ^s(iV) 7^ 0) < P(7(#„, TV) = 1). 
But P(X n n ^s(JV) 7^ 0) = 1 - P(X n n ^s(TV) = 0) = 1 - [1 - Pf(AT g M s {N))] n -> 1 as n -> 00, since 
P F (X G f s (I)) > 0. Hence lim^ P( 7 (Ar n> JV) = 1) = 1. ■ 

Remark 5.3. In particular, for P = U(T(y 3 )), the inequality P F (X G M S {N)) > holds iff A{& S {N)) > 0, 
then P(Af„ n« S (lV)^)-»l, □ 

For y 2 = {0, 1} C E, ^s(Ns) = {1/2}, so Lemma ED does not apply to N s in K. 
Recall that k (N pe ) = 3, then 

1 < E [ 7n (r, M)] < 3 and < Var [ 7n (r, M)] < 9/4. 

Furthermore, there is a stochastic ordering for 7 n (r, M). 

Theorem 5.4. Suppose X n is a set of iid random variables from a continuous distribution F on T(y 3 ). 
Then for r 1 < r 2 , we have 7„(r 2 , M) < ST 7 (X n , N p l E , M). 

Proof: Suppose n < r 2 . Then P ( 7 „(r 2 , M) < 1) > P( 7 „(n,M) < 1) since T r ^(X n ,M) C r^ 2 ^, M) 
for any realization of <Y ra and by a similar argument P(7„ (r 2 , M) < 2) > P ( 7n (n , M) < 2) so P (j n (r 2 , M) < 3) 
P ( 7 n(^i, Af) < 3) . Hence the desired result follows. ■ 



5.2 Geometry Invariance 

We present a "geometry invariance" result for N PE (-, M) where M- vertex regions are constructed using 
the lines joining 3^3 to M, rather than the orthogonal projections from M to the edges. This invariance 
property will simplify the notation in our subsequent analysis by allowing us to consider the special case of 
the equilateral triangle. 

Theorem 5.5. (Geometry Invariance Property) Suppose X n is a set of iid random variables from U (T '{y 3)) . 
Then for any r G [l,oo] the distribution of j n (r, M) is independent ofy^ and hence the geometry ofT(y 3 ). 

Proof: Suppose X ~ U(T(y)). A composition of translation, rotation, reflections, and scaling will take 
any given triangle T(y) = T(y 1 ,y 2 ,y 3 ) to the basic triangle T b = T((0,0), (1,0), (c x ,c 2 )) with < c x < 1/2, 
c 2 > 0, and (1 — c\) 2 + c| < 1. Furthermore, when X is also transformed in the same manner, say to 



X 1 1 then X 1 is uniform on T^, i.e., X' ~ U(Tb). The transformation e : R 2 — > R 2 given by cj> e (u,v) = 

lu+i^iw^jjj takes T 6 to the equilateral triangle T e = ((0, 0), (1, 0), (1/2, \/3/2)) . Investigation of the 

Jacobian shows that </f> e also preserves uniformity. That is. <f) e (X') ~ U(T e ). Furthermore, the composition 
of ei with the scaling and rigid body transformations, maps the boundary of the original triangle, T a , to 
the boundary of the equilateral triangle, T e , the lines joining M to yj in Tf, to the lines joining (f> e (M) to 
4>e{yj) in T e , and lines parallel to the edges of T a to lines parallel to the edges of T e . Since the distribution of 
7„ (r, M) involves only probability content of unions and intersections of regions bounded by precisely such 
lines and the probability content of such regions is preserved since uniformity is preserved; the desired result 
follows. ■ 

Note that geometry invariance of 7 (X n , iVp^ 00 , M) also follows trivially, since for r = 00, we have 
7„(r = 00, M) = 1 a.s. for all X n from any F with support in T(y 3 ) \ 3V 

Based on Theorem 15.51 we may assume that T(y 3 ) is a standard equilateral triangle with 
y a = {(0, 0), (1, 0), (1/2, V3/2)} for N r PE (; M) with M-vertex regions. 

Notice that, we proved the geometry invariance property for N PE where M-vertex regions arc dchncd with 
the lines joining 3^3 to M. On the other hand, if we use the orthogonal projections from M to the edges, the 
vertex regions, hence N PE will depend on the geometry of the triangle. That is, the orthogonal projections 
from M to the edges will not be mapped to the orthogonal projections in the standard equilateral triangle. 
Hence with the choice of the former type of M-vertex regions, it suffices to work on the standard equilateral 
triangle. On the other hand, with the orthogonal projections, the exact and asymptotic distribution of j n 
will depend on c\ , C2 , so one needs to do the calculations for each possible combination of C\ , c%. 

5.3 The Degenerate Case with 7„(r, M) 1 

Below, we prove that 7„(r, M) is degenerate in the limit for r > 3/2. 

Theorem 5.6. Suppose X n is a set of iid random variables from a continuous distribution F on T(3%). If 
M £ ,% (see Figure^ and Equation @j for ,% ), then lim^oo P (7„(r, M) = 1) = 1 for all M G M 2 \ y 3 . 

Proof: Suppose M ^ .%. Then 3%s (N PE ,M) is nonempty with positive area. Hence the result follows 
by Lemma [S~2l ■ 

Corollary 5.7. Suppose X n is a set of iid random variables from a continuous distribution F on T(y 3 ). 
Then for r > 3/2, lim, woo P (7„(r, M) = 1) = 1 for all M e R 2 \y 3 - 

Proof: For r > 3/2, ,% = 0, so M £ ,%. Hence the result follows by Theorem 151)1 ■ 
We estimate the distribution of 7„(r, M) with r = 2 and M = Mq for various n empirically. In Table Q] 
(left), we present the empirical estimates of J n (f, M) with n = 10, 20, 30, 50, 100 based on 1000 Monte Carlo 
replicates in T e . Observe that the empirical estimates are in agreement with the asymptotic distribution 
given in Corollary 15. 71 

k\n I 10 I 20 I 30 I 50 I "TOO I I k\n I 10 I 20 I 30 I 50 I 100~ 

~ 96i~' 1000 ^OOlT 1000 ^OOlT ~X~ ~jj 0~ ~0~ 

2 34 2 293 110 30 8 

3 5 3 698 890 970 992 1000 



Table 1: The number of 7„(r, M) = k out of TV = 1000 Monte Carlo replicates with M = M c and r = 2 
(left) and r = 5/4 (right). 

The asymptotic distribution of "f n (r, M) for r < 3/2 depends on the relative position of M with respect 
to the triangle 

5.4 The Degenerate Case with j n (r,M) 3 

Theorem 5.8. Suppose X n is a set of iid random variables from a continuous distribution F on T(y 3 ). If 
M e {-%)°, then P ( 7 „(r,M) = 3) -»• 1 as n -»• 00. 



We estimate the distribution of j n (r, M) with r = 5/4 and M = Mc for various n values empirically. 
In Tabic [T] (right), we present the empirical estimates of 7„(r, M) with n = 10, 20, 30, 50, 100 based on 
1000 Monte Carlo replicates in T e . Observe that the empirical estimates arc in agreement with our result in 
Theorem 15.81 

Theorem 5.9. Suppose X„ is a set of iid random variables fromU(T(y^)). If M £ d(.%-), then P (jnif, M) > 1) 
1 as n —t oo. 

For M £ d(,%), there are two separate cases: 

(i) M £ d(,%) \ {t\(r), t2(r), tz(r)} where tj(r) with j £ {1,2,3} are the vertices of 2F r whose explicit 
forms are given in Equation ([2j). 

(ii) M £ {tx(r), t 2 (r), t 3 (r)}. 

Theorem 5.10. Suppose X n is a set of iid random variables fromU(T(y^)) . If M £ d(3? r )\{ti(r), t2(r), £3(7-)}, 
then P (7n(r, M) = 3) — » 1 as n —>■ 00. 

We estimate the distribution of 7 „(r, M) with r = 5/4 and M = (3/5, V^/IO) £ d(%)\{ti{r), t 2 {r), t 3 (r)} 
for various n empirically. In Tabled we present empirical estimates of 7 n (r, M) withn = 10, 20, 30, 50, 100, 500, 
1000, 2000 based on 1000 Monte Carlo replicates in T e . Observe that the empirical estimates are in agreement 
with our result in Theorem 15. 101 



k\n 


10 


20 


30 


50 


100 


500 


1000 


2000 


1 


118 


60 


51 


39 


15 


1 


2 


1 


2 


462 


409 


361 


299 


258 


100 


57 


29 


3 


420 


531 


588 


662 


727 


899 


941 


970 



Tabic 2: The number of 7„ (r, M) = k out of N = 1000 Monte Carlo replicates with r = 5/4 and M = 
(3/5, V3/10). 



5.5 The Nondegenerate Case 

Theorem 5.11. Suppose X n is a set of iid random variables from U (T (y^)) . If A I £ {ti(r), t2(r), t^(r)}, 
then P {^f n {r, M) = 2) — > p r as n — > 00 where p r £ (0,1) is provided in Equation but only numerically 
computable. 

For example, p r= 5/4 ~ .6514 and P r= ^2 ~ -4826. 

So the asymptotic distribution of 7„(r, M) with r £ [1,3/2) and M £ {ti(r), t2(r), ts(r)} is given by 

7n(r,M) ~2 + BER(l-p r ). (7) 

We estimate the distribution of j n (r, M) with r = 5/4 and M = (7/10, V3/10) for various n empirically. 
In Tabled we present the empirical estimates of 7„(r,Af) with n = 10, 20, 30, 50, 100, 500, 1000, 2000 
based on 1000 Monte Carlo replicates in T e . Observe that the empirical estimates are in agreement with our 
result p r= 5/4 ~ .6514. 



k\n 


10 


20 


30 


50 


100 


500 


1000 


2000 


1 


174 


118 


82 


61 


22 


5 


1 


1 


2 


532 


526 


548 


561 


611 


617 


633 


649 


3 


294 


356 


370 


378 


367 


378 


366 


350 



Table 3: The number of 7 n (r, M) = k out of N = 1000 Monte Carlo replicates with r = 5/4 and M = 
(7/10,V3/10). 
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Remark 5.12. For r = 3/2, as n — ► oo, P ( 7n (r, M c ) > 1) -> 1 at rate O (n^ 1 ). □ 

Theorem 5.13. Suppose X n is a set of iid random variables from hi (T (3^)) ■ Then for r = 3/2, as n — > oo, 

7n (3/2, Mo) ~ 2 + BER(p » .2487) (8) 

For the proof of Theorem EH see lCevhan and Priebel (|2004 120051 ) . 
Using Theorem 15. 131 

lim E [ 7n (3/2, Mc)} = 3 - p 3/2 « 2.2587 (9) 
n — 'oo 

and 

hm Var [ 7n (3/2, M c )\ = 6 + p 3/2 - pj /2 « .1917. (10) 

Indeed, the finite sample distribution of 7„(3/2, Mc) hence the finite sample mean and variance can also be 
obtained by numerical methods. 

We also estimate the distribution of 7„(3/2, Mc) for various n values empirically. The empirical estimates 
for n = 10, 20, 30, 50, 100, 500, 1000, 2000 based on 1000 Monte Carlo replicates are given in Tabic U 
estimates are in agreement with our result p r= z/2 ~ -7413. 



k\n 


10 


20 


30 


50 


100 


500 


1000 


2000 


1 


151 


82 


61 


50 


27 


2 


3 


1 


2 


602 


636 


688 


693 


718 


753 


729 


749 


3 


247 


282 


251 


257 


255 


245 


268 


250 



Table 4: The number of 7„(3/2, M c ) = k out of N = 1000 Monte Carlo replicates. 



5.6 Distribution of the 7n (r, M) in Multiple Triangles 

So far we have worked with data in one Delaunay triangle, i.e., m = 3 or J3 = 1. In this section, we present 
the asymptotic distribution of the domination number of r-factor PCDs in multiple Delaunay triangles. 
Suppose y. m = {yi, y2, . . . , y TO } C I 2 be a set of m points in general position with m > 3 and no more than 
3 points are cocircular. Then there are J m > 1 Delaunay triangles each of which is denoted as Tj. Let M J 
be the point in Tj that corresponds to M in T e , 2?i be the triangle that corresponds to 3F r in T e , and tj(r) 
be the vertices of that correspond to ti(r) in T e for i G {1,2,3}. Moreover, let rij := \X n r\Tj\, the 
number of X points in Delaunay triangle Tj. For X n C Cjj(3^ m ), let r y rlj (r, M J ) be the domination number 
of the digraph induced by vertices of Tj and X n n Tj. Then the domination number of the r-factor PCD in 
J m triangles is 

J m 

7„(r, M, J m ) = J2 T»i ( r ' Mi )" 
i=i 

See Figure [10] (left) for the 77 <Y points that are in Cn^m) out of the 200 X points plotted in Figure [TJ 
Observe that 10 y points yield J10 = 13 Delaunay triangles. In Figure [TO] (right) are the corresponding 
arcs for M = Mc and r = 3/2. The corresponding j n = 22. Suppose X n is a set of iid random variables 
from U(CH(y m )), the uniform distribution on convex hull of y m and we construct the r-factor PCDs using 
the points M J that correspond to M in T e . Then for fixed m (or fixed J m ), as n — •> 00, so does each Uj. 
Furthermore, as n — > 00, each component 7^ (r, M 3 ) become independent. Therefore using Equation ([3]), 
we can obtain the asymptotic distribution of 7 ra (r, M 1 J m ). As n —* 00, for fixed J m , 

( 24 + BIN(J TO , 1 - Pr ), for M j g {t{(r)A(r), 4(r)} and r G [1, 3/2], 
7 „(r, M, J m ) - <^ J m , for r > 3/2, (11) 

I 3 J m , for A/ G &j \ {tj(r), «j(r), *j(r)} and r G [1, 3/2), 

where BIN(n,j>) stands for binomial distribution with n trials and probability of success p, for r G [1,3/2) 
and M G {ti(r),t2{r),ts(r)}, p r is given in Equation [3] and for r = 3/2 and M = Mc, p r ~ .7413 (see 
Equation ©). 



Figure 10: The 77 X points (crosses) in the convex hull of y points (circles) given in Figure [T] (left) and the 
corresponding arcs (right) of r-factor proportional edge PCD with r = 3/2 and M = Mc- 



5.7 Extension of N r PE to Higher Dimensions 

The extension to M. d for d > 2 with M = Mc is provided in ICevhan and Priebd (|2005l ). but the extension 
for general M is similar. 

Let 7 n (r, M, d) := r y(X n ,N PE ,M,d) be the domination number of the PCD based on the extension of 
Np E (-, M) to R d . Then it is easy to see that 7 n (?", M, 3) is nondegenerate as n — ► oo for r = 4/3. In M d , it 
can be seen that 7 n (r, Af, c?) is nondegenerate in the limit only when r = (d+ l)/d. Furthermore, for large d, 
asymptotic distribution of 7 n (r, M, d) is nondegenerate at values of r closer to 1. Moreover, it can be shown 
that lim„^ 00 P(2 < j n (r = (d + l)/d, M,d) < d+l) = 1 and we conjecture the following. 

Conjecture 5.14. Suppose X n is set of iid random variables from the uniform distribution on a simplex in 
M. d . Then the domination number 7„(r, M) in the simplex satisfies 

lim P(d < j n ((d+l)/d,M,d) < d+l) = 1. 

n — >oo 

For instance, with d = 3 we estimate the empirical distribution of j(X n , 4/3) for various n. The empirical 
estimates for n = 10, 20, 30, 40, 50, 100, 200, 500, 1000, 2000 based on 1000 Monte Carlo replicates for each 
n are given in Table [5l 



k\n 


10 


20 


30 


40 


50 


100 


200 


500 


1000 


2000 


1 


52 


18 


5 


5 


4 

















2 


385 


308 


263 


221 


219 


155 


88 


41 


31 


19 


3 


348 


455 


557 


609 


621 


725 


773 


831 


845 


862 


4 


215 


219 


175 


165 


156 


120 


139 


128 


124 


119 



Table 5: The number of 7„(4/3,M c ) = k out of N = 1000 Monte Carlo replicates. 



6 Discussion 

The r-factor proportional-edge proximity catch digraphs (PCDs), when compared to class cover catch di- 
graphs (CCCDs), have some advantages. The asymptotic distribution of the domination number 7„(r, M) 



of the r-factor PCDs, unlike that of CCCDs, is mathematically tractable (computable by numerical integra- 
tion). A minimum dominating set can be found in polynomial time for r-factor PCDs in M. d for all d > 1, but 
finding a minimum dominating set is an NP-hard problem for CCCDs (except for R). These nice properties 
of r-factor PCDs are due to the geometry invariancc of distribution of 7n(r, M) for uniform data in triangles. 

On the other hand, CCCDs are easily extendable to higher dimensions and are defined for all X n C M. d , 
while r-factor PCDs are only defined for X n C Cfr(!Vm)- Furthermore, the CCCDs based on balls use 
proximity regions that are defined by the obvious metric, while the PCDs in general do not suggest a metric. 
In particular, our r-factor PCDs are based on some sort of dissimilarity measure, but no metric underlying 
this measure exists. 

The finite sample distribution of 7n(r, M), although computationally tedious, can be found by numerical 
methods, while that of CCCDs can only be empirically estimated by Monte Carlo simulations. Moreover, 
we had to introduce many auxiliary tools to compute the distribution of 7„ (r, M) in K 2 . Same tools will 
work in higher dimensions, perhaps with more complicated geometry. 

The r-factor PCDs have applications in classification and testing spatial patterns of segregation or asso- 
ciation. The former can be p erformed building di scriminant regions for classification in a manner analogous 
to the procedure proposed in lPriebe et al.l (|2003a[) : and t he latter can be perf o rmed by using the asymptotic 
distribution of 7„(r, M) similar to the procedure used in Cevhan and Priebe] ( 20051 ). 
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Appendix 

First, we begin with a remark that introduces some terminology which we will use for asymptotics throughout this 
appendix. 

Remark 6.1. Suppose X n is a set of iid random variables from F with support S(F) C Q. If over a sequence 
fl n C f2, n = 1, 2, 3, . . ., X restricted to X\n n , has distribution F n with F n (x) = F(x)/Pf(X £ tt n ) and 
Pf(X £ fi n ) — * 1 as n — > oo, then we call F n the asymptotically accurate distribution of X and fl n the asymptotically 
accurate support of F . If F has density /, then f„ — /(x)/Pf(X £ fi„) is called the asymptotically accurate pdf of 
X. In both cases, if we are concerned with asymptotic results, for simplicity we will, respectively, use F and / for 
asymptotically accurate distribution and pdf. Conditioning will be implied by stating that X £ Q, n with probability 
1, as n — ► oo or for sufficiently large n. □ 

Proof of Theorem 14.41 

Without loss of generality, assume T{y-$) = Tt — T((0,0), (1,0), (ci,C2)) Note that the probability of edge extrema 
all being equal to each other is P(X ei (n) = X e2 (n) = X e3 (n)) = I(n = 1). Let E c ,2(n) be the event that there are 
only two distinct (closest) edge extrema. Then for n > 1, 

P(E c , 2 (n)) = P(X ei (n) = X e2 (n)) + P{X ei (n) = X e3 (n)) + P(X e2 (n) = X £3 (n)) 

since the intersection of the events {X ei (n) = X e . (n)} and {X Bi (n) = X ek (n)} for distinct k is equivalent to the 
event {X ei (n) = X e2 (n) — X e3 (n)}. Notice also that P{E c ^(n = 2)) = 1. So, for n > 2, there are two or three 
distinct edge extrema with probability 1. Hence P(E c ,3(n)) + P(E c ,2(n)) = 1 for n > 2. 

By simple integral calculus, we can show that P(E c ,2{u)) — > as n — » oo, which will imply the desired result. ■ 



Proof of Theorem RT81 



Note that (X)° ± iff r < 3/2. Suppose M € (X)°- Then for any point u in R M (yj), N r PE (u,M) C T{y 3 ), 
because there is a tiny strip adjacent to edge ej not covered by N PE (u, M), for each j G {1, 2, 3}. Then, N PE (u, M)U 
N r PE (v,M) C r(y 3 )foraU («,«) £ P M (yi) xJi M (ya)- Pick sup (u ,„ )eJiM(y?)XJiMCy2) JV^ i5 (M,M)uJVJ B («,M) C r(3> 3 ). 
Then T (3? 3 ) \ [sup (Uit , )eJ j J/(yi)xJ j J/{y2) 7Vp B (u, M) U Afp B (w, Af)] has positive area. So 

X n f] [T{y 3 ) \ [sup^ v)eRM(yi)xRM{y2) N PE (u,AI)uN r PE (v,M)]] ± 

with probability 1 for sufficiently large n. (The supremum of a set functional ^4(a;) over a range P is defined as the set 
5 := sup x£B A(x) such that S is the smallest set satisfying A(x) C S for all x G P.) Then at least three points — one 
for each vertex region — are required to dominate X n . Hence for sufficiently large n, j n (r, M) > 3 with probability 
f , but k {N PE ) = 3 by Theorem EHl Then lim^oo P ( 7 „(r, M) = 3) = 1 for r < 3/2. ■ 



Proof of Theorem 

Let M = (mijmj) G d(.%), say M G q 3 (r,x) (recall that qj(r,x) are defined such that d(yj,ej) = r ■ d(qj(r,x),yj) 
for j € {1, 2, 3}), then 7712 = v ^^ 2 r ~ r ^ and mi G J^ 3 ^ ' Ifr"] ' ( n ) ^ e one °^ ^ e c l° ses t point (s) to the edge 

ej; i.e., X ej (n) G argmin xgA: . re d(^, ej) for j G {1, 2, 3}. Note that X ej (n) is unique a.s. for each j. 

Notice that for all j G {1,2,3}, X ej (n) N PE (X) for all X G X n n Ri\i(yj) implies that 7„(r, M) > 1 with 
probability 1. For sufficiently large n, X Ej . (n) ^ N PE (X) for all X G Af n H R&tiyj) with probability 1, for j G {1, 2}, 
by the choice of M. Hence we consider only X e3 (n). The asymptotically accurate pdf of X e3 (n) is 



f ( x v) - r f A (Su(x,y)) V 



A{T{yz)Y 

where Su{x,y) is the unshaded region in Figure [TT] (left) (for a given X e3 (n) = x e3 — (x,y)) whose area is 
A(Su(x,y)) = V3 (2y-V3) 2 /12. Note that X es (n) £ N PE {X) for all X G X n nR M (y 3 ) iff X n n [TJ (*„, M) n 
-Rm^)] = 0- Then given X es (n) = (a?,y), 

P (*, n [rl M) n R M (y 3 )) = 0) = M^^^-^ri^n^Cy,)) ^ - 1 

V A(Ac/(a;, y)) J 

where A (rj («¥„, M) n Hat (y3)) = 3 ff.^ ) r (see Figure [TT1 fright) where the shaded region is Y\ (X n , M) n PA/(y 3 ) 
for a given X e3 (n) = (o;,y)), then for sufficiently large n 



P {X n n [r[ (X„, M) n R M (y 3 )] = 0) « 

' A{S v {x, y)) - A (TJ (X n , M) n ifo (ya)) x 



/e 3 (a;, 3/) dy 



A(Su(x,y)) 

n f A(S u (x,y))-A(T r 1 (X n ,M)nRM(y 3 )) Y- 1 

^TO)) V A(T{y 3 )) ) dydx - 



Let 

A(S u (x,y))-A(T r 1 (X n ,M)nR M (y 3 )) 4 fV3{2y-V3) 2 VSy 2 



G(x,y) = 



A(T(y 3 )) y/3\ 12 3(r-l)r 



which is independent on x, so we denote it as G(y). 

Let e > be sufficiently small, then for sufficiently large n, 

P (X n n [rl (AW, M) n E M (y 3 )] = 0) « 

-1-J//V3 



/ H nGiyY' 1 A/Vzdydx^ (l-2y/V$) f nGiy)^ 1 A/y/Zdy. 
Jo Jy/Vs ^ ' Ja 

The integrand is critical at y = 0, since G(0) = 1 (i.e., when x e3 G e 3 ). Furthermore, G(y) = 1 — 4j//v3 + O (j/ 2 ) 
around y = 0. Then letting y = w/n, we get 

p(A"„n[ri(A'n,M)n J RM(y 3 )] = 0) « (1 - ^) jT (1 - + o { n - 2 )Y 1 dw. 

letting n — > oo, w 4/V3 ^ exp ^— 4iu/\/3^ dto = 1. 

Hence lim„^oo P (7n(r, M) > 1) = 1. For M G gj(r, x) n^g* r with j 6 {1, 2} the result follows similarly. ■ 



m = (1/2, a/3/2) 



3 = (1/2, x/3/2) 




Ts^tt.O) 



Figure 11: A figure for the description of the pdf of X e3 (n) (left) and (X n ,M) (right) given X e3 (n) 
x e3 {x, y) . 



Proof of Theorem [57TU1 

Let M = (mi,m 2 ) G 0(^0 \ {*i(r), fcjCr), ts(r)}, say M e gs(r,a:). Then m 2 = . Without loss of generality, 

assume | < mi < 4^?. See also Figure [121 



y 3 = (l/2,\/3/2) 



(1/2.V3/2) 




72^X1,0) 



Figure 12: A figure for the description of the pdf of Qi(n) and Qd,{n) (left) and the unshaded region is 
N pE {q x ,M) U N r PE (q 3 , M) (right). 

Whenever X n n Km(Yj) / 0, let 

Qj(n) e argminjfg^n^^jd^ej) = argmax Xe ^ nnJtM(yi) d(i(yj,X),yj) for j e {1,2,3}. 

Note that at least one of the Qj(n) uniquely exists w.p. 1 for finite n and as n — > oo, Qj(n) are unique w.p. 1. Then 

7n(r,M) < 2 iff #„ c [jV£ E (pi{n), Af) U A£ b (§ a (n), Af)] or 

*n C [jV£b (Oa(n), M) U JVJ B (§3(n),M)] or C [jVJs (Qi(n),M) U N r PE (Q a (n),M 

Let be the event that X n C N r PE (<§i,M) U [iV^ (<2j(n), A/)] for (i, j) G {(1,2), (1, 3), (2, 3)} . Then 
P ( Tn (r, M) < 2) = P (P^ 2 ) + P (P 2 ' 3 ) + P (P^ 3 ) - P (P^ 2 n P 2 ' 3 ) - P (P^' 2 n P^' 3 ) 



P (P^' 3 n P 2 / 3 ) + P (P^' 2 n P 2 ' 3 n P^ 3 ) 



But note that P (Pn 2 ) as n — > oo by the choice of M since 



sup ueJiMtyi)^^. M ) U N PE (v, M) CT(y 3 ), 
"£-Rj\/(y2) 19 



and 

sup ueRAl(yi) N r PE (u, M) U N PE (v,M) / ) -> 1 as n oo. 

« 6Kif(y2) 



Then, 



P (El' 2 ) - P (El' 2 n El' 3 ) - P (El' 2 n El' 3 ) + P (El' 2 n El' 3 n P*' 3 ) < 4 P (P^' 2 ) -> as n -> oo. 
Therefore, 

lim P ( 7n (r, M) < 2) = lim (P (P 2 ' 3 ) + P (P*< 3 )) . 

Furthermore, observe that P (P*' 3 ) > P (El' 3 ) by the choice of M. Then we first find lim^oo P (P*' 3 ). Given 
a realization of with Qi(n) = gi = (0:1,2/1) and Qs(n) = §3 = (0:3,2/3), the remaining n — 2 points should fall, for 
example, in the undshaded region in Figure [T5] (left). Then the asymptotically accurate joint pdf of Qi(n),Q^(n) is 

n(n-l) / A(T(y 3 ))-A(S fl (C)(c)) \"" 2 

713 ^ A(r(y 3 )) 2 ^ ^(T(y 3 )) J 

where <f = (#1, j/i, £3, 2/3), Sr(*C) is the shaded region in Figure[l2](left) whose area is A(S R (<f)) = — ^ 2 (r-^i)" — ~~^ - + 

V3[2 V3r yi -3 (r-l)+6r(x 1 -m 1 )] 2 
72 r (l-r(2mi-l)) ' 

Given Q^n) = g, = for j € {1, 3}, 

✓ gA.sx = ,'A(JVf J5 (^i J Af)UJVf J! (© ) M))-A(5i l (C)) x " 



then for sufficiently large n 



p (El: 3 ) 



A(T(y 3 )) - A(S R (C)) 



A (N PE (%,M) U N r PE (q 3 , M)) - A(S R (£))\ " 2 - 



A(T(^ 3 )) 2 \ A(T(y 3 )) 



n-2 



n(n-l) ( A (N PE (gj , M) U jVjg (g 3 , M)) - (Q) \ 



A (Np E (£ , M) U ^ (?3) M)) = ^ A^rj/r+Srarr-S) (>/3(r-l)-2r ,3) 



where 

4 v 

See Figure [jj (right) for N PE (q±, M) U N r PE (q 3 ,M). Let 

^(iV£ E (ft.M) U 7V^ B (g 3 , M)) - A(S R (C)) 



G(C) 



A(T(y 3 )) 



Note that the integral is critical at xi = xs = mi and yi = 1/3 = 7712, since G((f) = 1. Since N PE (x, Mc) depends 
on the distance d(x,ej) for x- £ RM(Yj)> we make the change of variables (x\,yi) — ► (d(M,ei) + Zi,j/i) where 
d(M, ci) = v ^ 3< ~ r+ 4~ 2r ' mi ' > and (2:3,1/3) — > (a: 3 ,m2 + 2:3) then G((f) depends only on 21,2:3, we denote it G(z\,zz) 
which is 

8rz[ 4rz| _ 2r2 3 (V3(3-r)) + r (A Zl - 2 V?, mi ) 

G{ Zl ,z 3 )-\ 3 ( 1 + r ( 1 _ 2mi )) 3(r-l) 3 

The new integrand is Aij^y^yi G(zi, 2 3 )"~ 2 . Integrating with respect to X3 and 3/1 yields 2 ^,j^ and 3 (2rm[-r-i) ' 
respectively. Hence for sufficiently large n 

P (Ei' 3 ) « r r f ? ^ G^r 2 ^. 

1 7 Jo 7o MT(y 3 )) 2 \3(r-l)J \3(2rm 1 -r-l)J V 3J 

Note that the new integral is critical when z\ = 23 = 0, so we make the change of variables 21 = wi/^/n and 
23 = w-i/n then G(2i,2 3 ) becomes 

nl v 1 , 1 /^ 2 % /3r(r-3 + 2rm 1 ) 8r 2 \ / _ 3/2 N 

G(wi,w 3 ) = l + - W3 + T7 — Tl o \ Wl l+°[ n )> 

n\ 3 20 3(r + l-2rmi) / V / 



so for sufficiently large n 

-sfHe r ne 1 6 / 2 ^7 



JO 



n 3 3 \3(r~l) J \3(2rm 1 -r-l) 



4^37 



(-4mi+2 + V2) 



If 2^r(r-3 + 2rm 1 ) 8r 
n\ 3 WS + 3(r + f - 2rm 1 ) 



wt) + 0(n 



o 



Jo Jo 



wi u>3 exp 



2 \/3 r (r — 3 + 2 r mi)i»3 



-3/2 



3 (r + f - 2 r mi 



du)377Ji, 

d,W3Wl — O ~ ) 



since /o°° Jo°° »i w 3 exp (- 2v ^ r(r 3 3+2rmi) 77>3 - 3(r+1 i r 2rmi) w?) ^ 3 Wi = 8 r ( 3 - r (a . which is a fi nite con- 

stant. Then P (E„' 3 ) -> as 71 -> 00, which also implies P (P 2 ' 3 ) -> as n -> 00. Then P(7„(r, M) < 2) -> 0. 
Hence the desired result follows. ■ 

Proof of Theorem 15.111 

Let M = (7711,7712) £ {ti(r), t2(r), 43(7-)}. Without loss of generality, assume M = t2(r) then mi = 2 ~ r + c i x ) anc j 
C2 (r ' 1) . See Figure [13] 



7712 = 



- (l/2 lN /3/2) 



3 = (1/2, V3/2) 




Figure 13: A figure for the description of the pdf of Q\(n) and Q3(n) (left) and the unshaded region is 
N PE {q x , M) U N PE (q s , M) (right) given Q 3 (n) = q 3 for j G {1, 3}. 

Let Qj (71) and the events Ell 3 be defined as in the proof of Theorem I5TT01 for G {(1, 2), (1, 3), (2, 3)}. Then 
as in the proof of Theorem 15.101 



P (7n(r, M) < 2) = P (Ei' 2 ) + P (P 2 / 3 ) + P (Bi- S ) - P (i^' 2 n P 2 ' 3 ) - 

P (P^ 2 n Pi' 3 ) - P (P^ 3 n P 2 ' 3 ) + P (P^ 2 n P 2 ' 3 n E^ 



Observe that the choice of M implies that P (P*' 3 ) > P (P 2,3 ) and by symmetry (in T e ) P (P^' 2 ) = P (P 2,3 ). 
So first we find P (_E^' 3 ). As in the proof of Theorem 15.101 asymptotically accurate joint pdf of Qi(n), Q3(n) is 



/13 (0 = 



71(71-1) { A(T(y 3 ))-A(S R (£)) 



A{T{y-i)) 2 \ A(T(y 3 )) 
where £ = (au> yi, £3, Y3) an d Sr(£) is the shaded region in Figure [131 (left) whose area is 
A (S R (fy - ^ (2 7-7/3- ^(r -l) 2 ) | (V3r y 1 +3x 1 r-3Y 



12 (r - 1) r 36 (r - 1) r 

Given Qj(n) = % = (2^,%) for j € {1, 3}, 

' A (N r PE (91 , M) U N PE (q 3 , M)) - A(S R (C) ) 



P(P^ 3 



A(T(D^)-A(&(0) 



then for sufficiently large n 

f n(n-l) / A (N r PE (qi , M) U N PE (q 3 , M)) - A{S R {§) \ 
7 A(T(^ 3 )) 2 ^ A(T(y 3 )) ) C 

where 

Aim- (~ im,,ivr r~ Jf M ^ (2rya - V3(r- 1)) (3-V3ryi_-3rxi) 

A(iV PB (gi,M)UiVp B (q 3 ,M)) = — . 

4 6 

See Figure [T3] (right) for N PE (q\,M) U N P E (q 3 ,M). Let 

A (N r PE (ft, M) U iV£ B (g 3 , M)) - A(S«(C)) 



G(C) 



A(T(y 3 )) 



Note that the integral is critical when X! = xg = mi and yi = y 3 = rri2, since G((f) = f . 

As in the proof of Theorem 15. f 01 we make the change of variables (xi, yi) — > (d(M, ei) + Zi, j/i) where d(M, ei) 



t/3(t— 1) 



and (2:3,2/3) — ► (2:3,7712 + 23). Then G(<f) becomes 



2 r 

■1 



G(Zl,Z 3 ) = 1 - — — Zi - — — z 3 — zi z 3 . 

3 (r — 1) 3 (r — 1) 3 

The new integral is 

/ ^(y(y 3 ))2 G ( Zl ' zsT^dxadyxdzsdz!. 

Note that G(zi,Z3) is independent of 1/1,2:3, so integrating with respect to 2:3 and ?/i yields ^^rrw an d V(^-i) 3 ' 
respectively. The new integral is critical at zi = 23 = 0. Hence, for sufficiently large n and sufficiently small e > 0, 
the integral becomes, 



JO JO 



n (n — 1) / 12r \ xn _ 2 , , 

' zi Z3 G(zi,Z3) azidZ3. 



A{T{y 3 )Y V9(r-1) 



Since the new integral is critical when z\ = z-z = 0, we make the change of variables Zj = Wj/y/n for j € {1, 3}; then 
G(zi, 23) becomes 

G(wi,W3) — 1 - - — -. — ■ (w 1 + w 3 + 2r (r - 1) wi ws)) , 

an (r — 1) 



Pr ~ P f-E„ 



^•rr^f('(w)-) 



4r nn ~ 2 
(«! + w 3 + 2r (r — 1) wi IU3)) 



3n(r- 1) 
64 



dw 3 wi, letting n — > 00, 



00 roc r< * / \ 2 



tui W3 exp I — r —r (wf + w 2 + 2 r (9 — 1) Wi W3) I dw 3 w\ 



r — 1/ \3 (r — 1) 

which is not analytically integrable, but p r can be obtained by numerical integration, e.g., p r=v ^ ~ -4826 and 
p r=5 /4 ~ .6514. 

Next, we find lim n _, 00 P (£? 2 ' 3 ). The asymptotically accurate joint pdf of Qzin), Q 3 (n) is 

n(n-l) / A(T(y 3 ))-A(5^(C)) 
/as {<,) A{T{ y 3) y ^ A(T(y 3 )) 

where (f = (2:2,1/2,2:3,3/3) and S p ((^) is the shaded region in Figure [Til (left) whose area is 

V3 (2ry 3 + ^3(l-r)) ^3 (-y/3 r y 2 - 3 r x 2 - 3 r + 6) 



12r(r-l) 36 (2 - r) r 

22 




As before, 

p /„ 2 ,3s [( A (N PE (g 2 ,M) U N PE (g 3 , M)) - A(S R {£)) \ n ' 2 A ? 

n (n - 1) (A (N r PE (5a, M) U iVj; B ($3, M)) - A(S R (£)) 



A(T(y 3 )) 2 \ A(T(y 3 )) 

u A I ATT A~ ,,n, 1 »rr /~ »,r\\ %/3 (2 r y 3 - V3 (r -1)) <3- V3 r y 2 +3 r x 2 -3 r) 

where A (iV F£; ((72, M) U N PE (q 3 , M)) = ^ - i ^ i. 

See Figure [H (right) for N PE (q 2 ) U N PE (q 3 , M). Let 

A (jV£ B (§2 , M) U 7V P£ (53, AQ) - A(S R (C)) 



n-2 

dC, 



G(0 = 



A(T(^3)) 



Note that the integral is critical when £2 = £3 = mi and 3/2=2/3 = ""12, since G(c) = 1- 

We make the change of variables (23,3/3) — > (23,7712 + Z3) and (22,2/2) — ► (d(M, e2) + 22,3/2) where d(A'I,e 2 ) = 
~ r) . Then G(C) becomes 

„, > . 4rz| 4rz| 4 23 (3 - 2 r) 8r 2 z 2 z 3 

Cr(Z2, Z3) = 1 ; r ; r ■ 

v ' ' 3(2-r) 3(r-2) 3 3 

The new integral is 

/" X(T(y 3 )) 2 G ^ Z2 ' z 3)" 2 dx '3dy2dz3dz 2 . 

The integrand is independent of 23 and 3/2, so integrating with respect to 23 and 3/2 yields an d ^fr^i 

respectively. Hence, for sufficiently large n 

p/e*^~ f r w(n-l) ( 4r 2 \ a 

P (£;„' ) « / / ^^9 T 7 TT77; \ z 3 z 2 G{z 2 , Z 3 j dz 2 dz 3 . 



o 7 A(T(^3)) 2 \,3(r-l)(2-r) 

Note that the new integral is critical when z 2 = 23 = 0, so we make the change of variables z 2 = w 2 /y / n and 
23 = w 3 /n then G(z2,Z3) becomes 



G(w 2 ,w 3 ) =1- — 
n 



Arwl 4V3r-ra 3 (3- 2r) 



3(2-r) 



+ 



23 



so for sufficiently large n 



P {E^ 



\frit 



(n- 1) 64 r 2 

n 2 9 (r - 1) (2 - r) 

f / 4ru| 4V3rw3 (3 -2r) 
~ n V3(2-r) 3 



102 TO3 



+ In 



d,W3W2, 



O (n 



oc /-oc 



«J2 W3 exp 



4rw 2 ! 4:V3ru 3 (3-2i 



00 /-oo 



W2 exp 



'0 JO 

which is a finite constant 



3(2-r) 3 
4rw| 4v / 3rM 3 (3- 2r) 



d,WiW2 



dw3W2 = O (n x ) 
27(2 -r) 



3(2-r) 3 J 384 r 3 (3- 2r) 2 

Thus we have shown that P (E„) — > as n — > 00, which implies that as n — > 00, 

p (p 2 - 3 ) + p (p^ 2 ) - p (p^ 2 n p£ 3 ) - P (^' 2 n El; 3 ) 

- P (e^ 3 n P 2 - 3 ) + P (p^' 2 n P. 2 - 3 n p^' 3 ) < 5 p (P 2 - 3 ) -> o. 

Hence lining P (7»(r, M) < 2) = lining P (P ? V 3 ) and lining P(ln(r, M) > 1) = 1 together imply that 

lim P(j n (r,M) = 2)=p r . ■ 
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